home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
kermit.columbia.edu
/
kermit.columbia.edu.tar
/
kermit.columbia.edu
/
newsgroups
/
misc.19950726-19950929
/
000291_news@columbia.edu_Mon Sep 4 18:58:25 1995.msg
< prev
next >
Wrap
Internet Message Format
|
1995-12-25
|
5KB
Received: from apakabar.cc.columbia.edu by watsun.cc.columbia.edu with SMTP id AA20608
(5.65c+CU/IDA-1.4.4/HLK for <kermit.misc@watsun.cc.columbia.edu>); Mon, 4 Sep 1995 16:00:31 -0400
Received: by apakabar.cc.columbia.edu id AA18029
(5.65c+CU/IDA-1.4.4/HLK for kermit.misc@watsun); Mon, 4 Sep 1995 16:00:30 -0400
Path: news.columbia.edu!sol.ctr.columbia.edu!howland.reston.ans.net!spool.mu.edu!bloom-beacon.mit.edu!news5.ner.bbnplanet.net!news3.near.net!sun3.ipswitch.com!ddl
From: ddl@harvard.edu (Dan Lanciani)
Newsgroups: comp.protocols.kermit.misc
Subject: Re: MS-KERMIT 3.14 hanging on idle TCP/IP connection?
Message-Id: <2979@sun3.IPSWITCH.COM>
Date: 4 Sep 95 18:58:25 GMT
References: <42d2u9$edt@apakabar.cc.columbia.edu> <42dodl$go@apakabar.cc.columbia.edu>
Organization: Internet
Lines: 69
Apparently-To: kermit.misc@watsun.cc.columbia.edu
In article <42dodl$go@apakabar.cc.columbia.edu>, chaiklin@konichiwa.cc.columbia.edu (Seth Chaiklin) writes:
|
| Joe Doupnik <jrd@cc.usu.edu> wrote:
| > Did you have a chance to look at the ARP cache on the Linux machine?
| >I've heard rumors (I don't use Linux) that it times out and can yield just
| >the effects noted. You might try pinging MSK from the Linux end as one way
| >of correcting its ARP cache.
|
| You are definitely on the right track (and thanks for the fast response!).
|
| I tried an experiment. I let the MSK machine sit idle while
| connected to the Linux machine, and after 10 minutes (while true;
| do date; arp -a; sleep 60; done), I discovered that the Linux arp
| cache loses the HW address of the ethernet card, at which point,
| of course, the MSK machine appears to be frozen.
Note that most implementations intentionally time out ARP entries; this
is a feature. I doubt that the entry is lost as such, though timeouts
are usually a bit longer. You may be looking at an ARP bug in Linux
or kermit involving bad behavior when one side already knows the address.
These kinds of bugs come up more often than you might imagine since
the ARP process for mainly-client programs is usually one way and the
reverse process may be only lightly tested. Keep in mind that the answerer
of an ARP request also retains the address of the caller to avoid sending
an ARP itself. Starting with both machines ignorant of the hardware
addresses, the process might go like this:
kermit -> ARP-REQUEST -> Linux (saves kermit's hardware address)
Linux -> ARP-RESPONSE -> kermit
Since this is the most common sequence, kermit probably doesn't have to
answer ARP requests at all most of the time.
| I tried pinging the MSK machine from the Linux machine, but it
| does not respond. However, if I hand-entered the HW address for
| the MSK machine, then deleted this entry from the arp cache, and
| then added it again, I could reestablish input/output being shown
| on the MSK machine, and everything seems to work as it should.
You'd need a network trace to be sure, but this suggests that kermit
isn't responding to ARPs in its current state. (It could also be that
Linux isn't sending them at all, but that would be such a devastating
error that it would have been noticed long ago. I hope.) I think there
are at least two additional experiments that might shed light on the situation.
First, while in the bad state, try to ping it from another machine that
has never been involved with the connection at all. This should tell
you whether kermit is willing to respond to anybody's ARP at this point.
If it doesn't respond then it has somehow been corrupted (or doesn't
respond to ARPs in general). If it does respond then it may be that kermit
has a problem answering ARPs when it already knows the peer's hardware
address. If it does not respond, move on to the next test:
Start kermit fresh and don't connect to anything (I assume you can do this
and still have the tcp running?). Now try to ping kermit from a machine
which has no ARP entry for kermit. If this works and the first test failed
then it is likely the program is becoming corrupted somehow. If the second
test fails then kermit doesn't respond to ARPs at all (seems unlikely) or
you have some obscure problem with broadcasts and/or frame types that is
blocking ARPs in one direction. (Don't laugh; I've seen it.)
The general idea is that things can work remarkably well with ARPs functioning
in only one direction and it takes something like a short cache timeout to
bring the problem to light. Consider that one end could be totally incapable
of receiving broadcasts (bad NIC, bad driver, etc.) and it would still appear
to function normally as long as it always ARP'ed first and the peer had a long
timeout.
Dan Lanciani
ddl@harvard.*